<html>
<head><meta charset="utf-8"><title>Can libcore functions use SIMD as an optimization? · t-libs · Zulip Chat Archive</title></head>
<h2>Stream: <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/index.html">t-libs</a></h2>
<h3>Topic: <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html">Can libcore functions use SIMD as an optimization?</a></h3>

<hr>

<base href="https://rust-lang.zulipchat.com">

<head><link href="https://rust-lang.github.io/zulip_archive/style.css" rel="stylesheet"></head>

<a name="209830104"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209830104" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Thom Chiovoloni <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209830104">(Sep 11 2020 at 20:18)</a>:</h4>
<p>Are we allowed to use SIMD functions (from core::arch) to optimize functions in <code>libcore</code>? Assume it's behind a <code>#[cfg(target_feature="sse2")]</code> check or similar, as I know runtime feature checking isn't available to libcore.</p>
<p>SSE/SSE2 is the only ISA I'd consider here because <em>I think</em> it available for the distributed builds of libcore on x86_64, and anything higher would require something like <code>-Zbuild-std</code> + <code>-Ctarget-cpu=native</code>...  I think?  But maybe this is wrong, because of things like <code>-Ctarget-feature=-sse2</code>...</p>
<p>I couldn't find any cases where we do it now, which makes me think it's not allowed, but I'd like to know for sure.</p>



<a name="209830465"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209830465" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> simulacrum <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209830465">(Sep 11 2020 at 20:22)</a>:</h4>
<p>I think it would be fine to do this, though we'll want to be careful to make sure that each code path is still exercised in CI -- if we can't easily do that then I would consider that a deal breaker to doing this.</p>



<a name="209832639"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209832639" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jubilee <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209832639">(Sep 11 2020 at 20:43)</a>:</h4>
<p>I recently reviewed every issue regarding SIMD in the compiler (indeed, every issue about SIMD opened against rust-lang/rust). Some reviews for vectorized implementations of various things have been done in the past but did not yield apparent performance improvements over the current in the explicit SIMD versions, which has somewhat dampened enthusiasm for the idea.</p>



<a name="209833136"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209833136" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jubilee <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209833136">(Sep 11 2020 at 20:47)</a>:</h4>
<p>It has been a want-to-have though, it's just about finding the things that actually <em>do</em> speed up when explicitly vectorized.</p>



<a name="209835295"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209835295" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Josh Triplett <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209835295">(Sep 11 2020 at 21:06)</a>:</h4>
<p><span class="user-mention" data-user-id="209168">@Thom Chiovoloni</span> I think it'd be perfectly fine to do so, with a few caveats:</p>
<ul>
<li>There may be x86_64 targets where it <em>isn't</em> allowed, such as embedded/kernel targets. It has to be possible to prevent it.</li>
<li>It'd be nice to avoid unexpected overheads from calling CPUID.</li>
<li>It'd be nice to make it possible to build several versions of functions and choose at runtime at a higher level. For instance, we could build multiple copies of libstd (when distros build it as a shared library) and install them to different directories, which the dynamic linker can decide between at library load time based on CPU capabilities.</li>
</ul>



<a name="209835869"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209835869" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jubilee <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209835869">(Sep 11 2020 at 21:13)</a>:</h4>
<p>currently imagining a lazily-init'd CPUID call for feature detection (OnceCell style?) that then aligns SIMD-dependent functions to the appropriate usages.</p>



<a name="209842288"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209842288" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> scottmcm <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209842288">(Sep 11 2020 at 22:22)</a>:</h4>
<p>I agree that <code>sse2</code> is probably the only one that would work, for the same kind of reason as <code>debug_assert</code> not working.</p>
<p>Did you have any particular intrinsics in might that you wanted to use, <span class="user-mention" data-user-id="209168">@Thom Chiovoloni</span>?</p>



<a name="209842406"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209842406" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Thom Chiovoloni <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209842406">(Sep 11 2020 at 22:23)</a>:</h4>
<p><span class="user-mention silent" data-user-id="116122">simulacrum</span> <a href="#narrow/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F/near/209830465">said</a>:</p>
<blockquote>
<p>I think it would be fine to do this, though we'll want to be careful to make sure that each code path is still exercised in CI -- if we can't easily do that then I would consider that a deal breaker to doing this.</p>
</blockquote>
<p>Yeah, of course. Open to suggestions:</p>
<p>Naively, my thought is (and this is only hard because <code>#[cfg(test)]</code> isn't available I think, so let me know if I'm wrong about that or if there's another obvious option):</p>
<p>Expose the scalar/fallback impl under a perma-unstable <code>#[feature(core_private_simd_fallback)]</code>. Tests would exercise both this impl, and whichever impl we're shipping.</p>
<p>The big downside here is that we still would include this code in non-testing scenarios, despite it being dead code outside test scenarios. Maybe this is fine, or maybe it can be avoided by having x.py pass a special <code>--cfg</code> during tests and such? (Or maybe this already exists? Again I'm assuming <code>#[cfg(test)]</code> doesn't work becasue these are integration tests and not unit tests)...</p>
<p>That said, for a lot of cases, the fallback impl might still get used for inputs that are too small for the vectorized impl, so perhaps it wouldn't be dead code, and we could punt on this...</p>
<p>Alternatively, maybe the fact that CI is (presumably) run on both machines with and without support (I'd rather have it be easier to run the tests on both impls locally, so I don't like this, but it's the easiest option).</p>
<p><span class="user-mention" data-user-id="239881">@Josh Triplett</span> <span class="user-mention" data-user-id="281757">@Jubilee</span>: Honestly I'm just asking about static dispatch, e.g. <code>#[cfg(target_feature = "sse2")]</code>, which <em>should</em> compile completely away for builds without that feature. That is: no cpuid call would be done/needed. This keeps things simple, easy to test, and is a big win in practice.</p>
<p>Concerns about cases like the kernel were why I asked this though, but I assume they already would have to build a custom libcore to avoid these instructions being generated by LLVM? (Since from checking it looks like the stdlib that gets distributed via rustup for x86_64 platforms is compiled with that target feature on and is already using these instructions in many cases).</p>
<p><span class="user-mention silent" data-user-id="281757">Jubilee</span> <a href="#narrow/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F/near/209833136">said</a>:</p>
<blockquote>
<p>It has been a want-to-have though, it's just about finding the things that actually <em>do</em> speed up when explicitly vectorized.</p>
</blockquote>
<p>Obviously this would come with benchmarks, and the win would also have to be significant enough to justify the additional harder-to-maintain code. (And it also can't regress stuff like small inputs).</p>
<p>There are several places where I'd like to do this (and it looks like I'll have a bunch of free time soon, and have recently learned how to test libcore without having to do a build of the compiler <span aria-label="sweat smile" class="emoji emoji-1f605" role="img" title="sweat smile">:sweat_smile:</span>). Anyway, the most obvious case to me is string and byte-slice processing, since it's such a big win to avoid per-byte work (you can see this in the fact that there are SWAR impls of some of these already, some of which I've contributed).</p>



<a name="209843135"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209843135" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Thom Chiovoloni <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209843135">(Sep 11 2020 at 22:32)</a>:</h4>
<p><span class="user-mention" data-user-id="125270">@scottmcm</span> Are there any particular ones which would be a problem?</p>
<p>In general this would be several intrinsics used to optimize a function that operates on a input slice multiple elements at a time. (In practice this is really the only way to get good SIMD perf IME), and so it's hard to come up with the exact set I'd need without just writing the code.</p>
<p>Anyway the time I most often feel the pain is when I call a function on &amp;str which does per-byte processing, so the <code>_mm_*_ep[iu]8</code> functions, but definitely others too.</p>
<p>If it helps, I don't really have any plans that would use the float operations, since libstd barely has any functions that use floats. Uh... <em>maybe</em> PartialEq/PartialOrd specializations for e.g. <code>[f32]</code>? But that's assuming it doesn't autovectorize into something half decent already (and it absolutely might).</p>



<a name="209846281"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209846281" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jubilee <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209846281">(Sep 11 2020 at 23:19)</a>:</h4>
<p><span aria-label="sweat smile" class="emoji emoji-1f605" role="img" title="sweat smile">:sweat_smile:</span> I tried not to say "gee, it sounds like you guys want a full portable SIMD story realized."<br>
But honestly that's kind of what this might need, since even static feature detection needs work rn, as far as I can tell. :/</p>
<p>One of the things that people thought was going to improve but didn't in actuality was SipHash vectorizing. It apparently is already well-enough vectorized by LLVM or something? <span aria-label="shrug" class="emoji emoji-1f937" role="img" title="shrug">:shrug:</span></p>
<p>Floats often autovectorize. And float vectorization is actually kind of almost dangerous because it appears to have wildly different results than scalar code (NaNs do different things and subnormal numbers...), I would actually say not to introduce float vectorization to the compiler yet without some thorough test suites that demonstrate it's not going to mess up anything.</p>



<a name="209848854"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209848854" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Amanieu <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209848854">(Sep 12 2020 at 00:03)</a>:</h4>
<p>In the case of SSE2 we are already testings both branches in CI: the <code>i686</code> target includes SSE2 while the <code>i586</code> target doesn't.</p>



<a name="209848863"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209848863" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Amanieu <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209848863">(Sep 12 2020 at 00:03)</a>:</h4>
<p>(x86_64 always supports SSE2)</p>



<a name="209848945"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209848945" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Amanieu <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209848945">(Sep 12 2020 at 00:04)</a>:</h4>
<p>In fact hashbrown (used by libstd) already does this: <a href="https://github.com/rust-lang/hashbrown/blob/0f386166035412b11e4f82263a750d9a6e0fd97a/src/raw/mod.rs#L12">https://github.com/rust-lang/hashbrown/blob/0f386166035412b11e4f82263a750d9a6e0fd97a/src/raw/mod.rs#L12</a></p>



<a name="209849347"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209849347" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jubilee <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209849347">(Sep 12 2020 at 00:12)</a>:</h4>
<p>Aha, good. <span aria-label="eyes" class="emoji emoji-1f440" role="img" title="eyes">:eyes:</span> Hmmm...</p>



<a name="209850743"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209850743" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Lokathor <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209850743">(Sep 12 2020 at 00:40)</a>:</h4>
<p><span class="user-mention" data-user-id="281757">@Jubilee</span> can you say more about what's wrong with static detection and dispatch?</p>



<a name="209851700"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209851700" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jubilee <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209851700">(Sep 12 2020 at 01:01)</a>:</h4>
<p><a href="https://github.com/rust-lang/rust/issues/64609">https://github.com/rust-lang/rust/issues/64609</a> This soundness hole is what I was primarily thinking of re: x86, might not apply here. I am still mulling over some of what I have read and revisiting it, which is why I expressed the uncertainty of "as far as I can tell."</p>



<a name="209852405"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209852405" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Lokathor <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209852405">(Sep 12 2020 at 01:20)</a>:</h4>
<p>Ah, this issue. Yes, it's a potential problem. However, that's a problem with <code>#[target_feature(enable = "foo")]</code>.</p>
<p>If you're <em>only</em> using <code>#[cfg(target_feature = "foo")]</code> to include code or not (eg: on a block), then you can't hit that particular issue. All functions in the build will exist with the same cpu feature set, so you can't get ABI incompatibility problems.</p>



<a name="209855415"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209855415" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Thom Chiovoloni <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209855415">(Sep 12 2020 at 02:56)</a>:</h4>
<p><span class="user-mention silent" data-user-id="281757">Jubilee</span> <a href="#narrow/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F/near/209846281">said</a>:</p>
<blockquote>
<p><span aria-label="sweat smile" class="emoji emoji-1f605" role="img" title="sweat smile">:sweat_smile:</span> I tried not to say "gee, it sounds like you guys want a full portable SIMD story realized."<br>
But honestly that's kind of what this might need, since even static feature detection needs work rn, as far as I can tell. :/</p>
</blockquote>
<p>This is why I asked, although it sounds like it works (hashbrown uses it). The link you gave is about runtime target feature stuff, e.g. <code>#[target_feature]</code> and not <code>#[cfg(target_feature)]</code>. They are easy to confuse, though.</p>
<blockquote>
<p>One of the things that people thought was going to improve but didn't in actuality was SipHash vectorizing. It apparently is already well-enough vectorized by LLVM or something? <span aria-label="shrug" class="emoji emoji-1f937" role="img" title="shrug">:shrug:</span></p>
</blockquote>
<p>SipHash sounds very hard to do, tbh. As an example of something more normal, but still relevant:</p>
<p>For a day or two I've been working on a SWAR/int-simd/treat-usize-as-bytevec version of Chars::count\*, which took longer than I'd like on a big file. (This should be up as a PR tomororw, it's done but needs more thorough testing, and I'd like to verify that it performs ok on things other than x86_64). It was pretty easy for me to copypaste it the SWAR code, and rewrite it to be a SSE2 version.</p>
<p>Here are the benchmarks for the old version, my SWARized version, and an SSE2 version: <a href="https://gist.github.com/thomcc/7bd1d22dfe3d31c1c4cc0de7771729aa">https://gist.github.com/thomcc/7bd1d22dfe3d31c1c4cc0de7771729aa</a> . You'll note that the SSE2 impl is around 10x faster (or more) of the old impl, and 2x faster than the SWAR version. This is actually <em>less</em> of an improvement than you'd normally get, because LLVM actually autovectorizes the naive Chars::count (see <a href="https://godbolt.org/z/5bvd7s">https://godbolt.org/z/5bvd7s</a>)! (Also I spent 0 time tuning it)</p>
<p>\* (Aside: Of course, Chars::count() is probably not a function worth having both a SWAR and a SSE2 impl of is worth it, since it's often an antipattern to care about the number of codepoints (I know I'm going to have to make a case for why it's worth even maintaining a SWARed version), and between SWAR and SSE2, the SWAR works everywhere so it's more important. This is not the point here)</p>
<blockquote>
<p>Floats often autovectorize. And float vectorization is actually kind of almost dangerous because it appears to have wildly different results than scalar code (NaNs do different things and subnormal numbers...), I would actually say not to introduce float vectorization to the compiler yet without some thorough test suites that demonstrate it's not going to mess up anything.</p>
</blockquote>
<p>So, I said I don't have real plans to do anything with floats, but... float FUD is a pet peeve of mine, and this is not accurate. Pretty much all float operations on x86_64 use the vector registers, and this is in part because it's actually a compliant impl of ieee754, unlike the x87 stack. (And it's faster).</p>
<p>(Also autovectorization is usually pretty bad and extremely fragile in practice. It also kicks in more for integers IME than floats... That said, again, the stdlib doesn't really have much float code that I can think of where this would be worthwhile)</p>



<a name="209861992"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209861992" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jubilee <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209861992">(Sep 12 2020 at 03:24)</a>:</h4>
<p>Neat!</p>



<a name="209863071"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209863071" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Jubilee <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209863071">(Sep 12 2020 at 04:01)</a>:</h4>
<p>I'll admit that does clarify many things I was a bit confused about.<br>
So the other thing about floats is that LLVM's  float implementation is probably busted, so it's hard to take advantage of IEEE754 compliance if we're just hoping on LLVM not getting in the architecture's way. ( though I imagine it's possible the way it is busted doesn't matter for the implementations you're thinking of. ) (and we're probably not any more or less busted + / - an explicit SIMD implementation of anything? probably. )</p>



<a name="209867952"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209867952" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> pachi <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209867952">(Sep 12 2020 at 06:44)</a>:</h4>
<p><span class="user-mention" data-user-id="209168">@Thom Chiovoloni</span> your knowledge of floats could be very interesting to the <span class="user-group-mention" data-user-group-id="1916">@WG-const-eval</span> group, as const eval of float code is something that would be desirable but, afaict, there are concerns about getting different results. <span class="user-mention" data-user-id="124288">@oli</span> and <span class="user-mention" data-user-id="120791">@RalfJ</span> are the ones to get in touch with (I say it just in case you're looking for good ways to contribute)</p>



<a name="209870778"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209870778" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> RalfJ <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209870778">(Sep 12 2020 at 08:19)</a>:</h4>
<p>From what I can see the thread here has nothing to do with const-eval though?</p>



<a name="209870824"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209870824" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> RalfJ <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209870824">(Sep 12 2020 at 08:20)</a>:</h4>
<p>However, if libstd starts to use SSE2, it'd be rgeat if we could not break Miri. :) Something like what hasbrown does works, where <code>cfg(miri)</code> is a signal to opt-out of the vectorized implementation.</p>



<a name="209871760"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209871760" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> pachi <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209871760">(Sep 12 2020 at 08:49)</a>:</h4>
<p><span class="user-mention" data-user-id="120791">@RalfJ</span> weren't there some questions around the use of floats (and getting different results at compiled and runtime) in const contexts? Maybe I misunderstood it.</p>



<a name="209871772"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209871772" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> RalfJ <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209871772">(Sep 12 2020 at 08:50)</a>:</h4>
<p>not in this thread as far as I can see, no</p>



<a name="209871819"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209871819" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> RalfJ <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209871819">(Sep 12 2020 at 08:50)</a>:</h4>
<p>that question also exists around floats, but it's separate from explicitly using SIMD in libcore</p>



<a name="209871896"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209871896" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> pachi <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209871896">(Sep 12 2020 at 08:53)</a>:</h4>
<p>Well, my comment was just that <span class="user-mention" data-user-id="209168">@Thom Chiovoloni</span> looks like very familiar with float semantics and that could be useful in that other wg. But yes, not explicitly the simd part.</p>



<a name="209871939"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209871939" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> pachi <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209871939">(Sep 12 2020 at 08:54)</a>:</h4>
<p>Sorry if this might disturb you</p>



<a name="209872971"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209872971" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> Thom Chiovoloni <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209872971">(Sep 12 2020 at 09:27)</a>:</h4>
<p>(Const-eval for floats seems off topic yes, but if you're in dire need of people who have floating point intuition/opinions/experience, I can take a look at issues (I certainly have the "opinions" part, anyway). I probably should now that I have some spare time, since I'd like floats to continue working well. That said, it's been a while since I did nontrivial numerical work, and probably have to page it all back in)</p>
<p>That said, there's probably some discussion eventually to be had about the fact that most of these sorts of optimizations aren't possible when performing constant evaluation, which could force us to choose between a fast impl and a <code>const fn</code> impl... E.g. marking a function as const fn prevents us from ever adding a vectorized version... That said, for now that's okay.</p>
<p>Re: keeping miri working, I use it almost every day so hope I wouldn't forget and break it, but I was going to just going to copy hashbrown's cfg expression verbatim, since it already works.</p>



<a name="209873990"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/219381-t-libs/topic/Can%20libcore%20functions%20use%20SIMD%20as%20an%20optimization%3F/near/209873990" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> RalfJ <a href="https://rust-lang.github.io/zulip_archive/stream/219381-t-libs/topic/Can.20libcore.20functions.20use.20SIMD.20as.20an.20optimization.3F.html#209873990">(Sep 12 2020 at 09:57)</a>:</h4>
<blockquote>
<p>That said, there's probably some discussion eventually to be had about the fact that most of these sorts of optimizations aren't possible when performing constant evaluation, which could force us to choose between a fast impl and a const fn impl... E.g. marking a function as const fn prevents us from ever adding a vectorized version... That said, for now that's okay.</p>
</blockquote>
<p>plans have been aired several times (if only I remember where...) to have a way for a <code>const fn</code> to provide a "const-time fallback implementation". it's hard, in particular to figure out what the exact contract for that function is, but not impossible.<br>
EDIT: <a href="https://github.com/rust-lang/const-eval/issues/7">https://github.com/rust-lang/const-eval/issues/7</a> seems relevant</p>



<hr><p>Last updated: Aug 07 2021 at 22:04 UTC</p>
</html>